146
surprising that for all the systems biology properties discussed in Sect. 5.1, we can now
tell through concrete data analysis which molecules are involved in the individual feed
back loops, in signalling cascades and in the individual building units (modules).
But larger contours are also becoming visible. For example, the importance of RNA as
an important level of cell regulation had previously been underestimated, as has only been
fully realised in recent years with the discovery of lncRNAs (long non-coding RNAs) and
miRNAs (microRNAs) in higher cells and sRNAs (small RNAs) in bacteria. For example,
an important lncRNA inactivates the second X chromosome in females (xist RNA) and is
therefore involved in this fundamental difference between males and females. In contrast,
miRNA-21 stops phosphatases such as PTEN and stimulates tumor growth, thus being an
important tumor marker. For understanding this new level of cellular regulation, integra
tive bioinformatic analysis of the transcriptome (and its interplay with other omics
domains) is a crucial prerequisite (e.g. two of our papers Fuchs et al. 2020 and Stojanović
et al. 2020 showing a link of RNA and proteome to miRNA regulation in cardiac and pul
monary fibrosis).
A second example for a deeper understanding of the design principles of our cells is
tissue replacement by artificial tissue or stem cells. Here, bioinformatics is essential to
uncover signaling pathways and generate suitable tissue or reprogram stem cells.
Another current application of the cell’s design principles is protein design: bioinfor
matics and experiments that systematically change protein structures to investigate how a
protein acquires new properties. This now works well enough with the large number of
protein structures (e.g. 3D coordinates from the PDB database) that this is being used
more and more actively. First of all, the protein structure has to be predicted. This can be
done particularly well using a template (protein with a known structure; “homology mod
elling”), for example using the SWISS-MODEL software (Waterhouse et al. 2018). All
known structural domains in a protein can be found with AnDOM (3D domain annota
tion). If there is insufficient (approximately 62% same/similar amino acids) similarity to a
known protein structure, one can determine the best matching structure by threading the
sequence on all known structures (“threading”; e.g., server I-TASSER; Zheng et al. 2019a)
or LOMETS (Zheng et al. 2019b), or by protein folding simulations (“ab-initio”; e.g.,
QUARK server; Zheng et al. 2019a).
This is followed by the design step: for about three decades, ligands and pharmaceuti
cals have been optimized to better fit the protein structure, e.g. the receptor. Drugs against
HIV infection have often been achieved by design. More recently, one actively incorpo
rates protein structures into simulations and predictions, using high-throughput experi
mental methods (Lam et al. 2018; Dominguez et al. 2017), and also understands catalysis
in enzymes or receptor function better and better (Mahalapbutr et al. 2020; Sgrignani et al.
2020). However, protein structure can also be used to selectively alter protein structure
itself, for example to improve enzyme activities (Leman et al. 2020; Rosetta software) and
to systematically change protein building units, even to combine them into logic circuits
(Chen et al. 2020), where it is now easy to add or swap secondary structure in particular.
11 Design Principles of a Cell